NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning

Xu, Haoran; Li, Shuozhe; Sikchi, Harshit; Niekum, Scott; Zhang, Amy (April 2025, International Conference on Learning Representations)

Free, publicly-accessible full text available April 25, 2026
D2PO: Discriminator-Guided DPO with Response Evaluation Models

Singhal, Prasann; Lambert, Nathan; Niekum, Scott; Goyal, Tanya; Durrett, Greg (October 2024, Proceedings of the Conference on Language Modeling (COLM))

Full Text Available
Data-Efficient Policy Evaluation Through Behavior Policy Search

Hanna, Josiah P; Chandak, Yash; Thomas, Philip S; White, Martha; Stone, Peter; Niekum, Scott (October 2024, Journal of machine learning research)
Ravikumar, Pradeep (Ed.)
We consider the task of evaluating a policy for a Markov decision process (MDP). The standard unbiased technique for evaluating a policy is to deploy the policy and observe its performance. We show that the data collected from deploying a different policy, commonly called the behavior policy, can be used to produce unbiased estimates with lower mean squared error than this standard technique. We derive an analytic expression for a minimal variance behavior policy -- a behavior policy that minimizes the mean squared error of the resulting estimates. Because this expression depends on terms that are unknown in practice, we propose a novel policy evaluation sub-problem, behavior policy search: searching for a behavior policy that reduces mean squared error. We present two behavior policy search algorithms and empirically demonstrate their effectiveness in lowering the mean squared error of policy performance estimates.
more » « less
Full Text Available
Contrastive Preference Learning: Learning from Human Feedback without RL

Hejna, Joey; Rafailov, Rafael; Sikchi, Harshit; Finn, Chelsea; Niekum, Scott; Knox, W Bradley; Sadigh, Dorsa (May 2024, International Conference on Learning Representations (ICLR))

Full Text Available
Contrastive Preference Learning: Learning from Human Feedback without RL

Hejna, Joey; Rafailov, Rafael; Sikchi, Harshit; Finn, Chelsea; Niekum, Scott; Knox, Bradley; Sadigh, Dorsa (May 2024, International Conference on Learning Representations (ICLR))

Full Text Available
Contrastive Preference Learning: Learning from Human Feedback without RL

Hejna, Joey; Rafailov, Rafael; Sikchi, Harshit; Finn, Chelsea; Niekum, Scott; Knox, W Bradley; Sadigh, Dorsa (May 2024, International Conference on Learning Representations (ICLR))

Full Text Available
Fairness Guarantees Under Demographic Shift

Giguere, Stephen; Metevier, Blossom; Brun, Yuriy; Castro da Silva, Bruno; Thomas, Philip; Niekum, Scott (April 2022, International Conference on Learning Representations)

Full Text Available
Fairness Guarantees under Demographic Shift

Giguere, Stephen; Metevier, Blossom; Brun, Yuriy; da Silva, Bruno Castro; Thomas, Philip S.; Niekum, Scott (April 2022, Proceedings of the 10th International Conference on Learning Representations (ICLR))

Recent studies found that using machine learning for social applications can lead to injustice in the form of racist, sexist, and otherwise unfair and discriminatory outcomes. To address this challenge, recent machine learning algorithms have been designed to limit the likelihood such unfair behavior occurs. However, these approaches typically assume the data used for training is representative of what will be encountered in deployment, which is often untrue. In particular, if certain subgroups of the population become more or less probable in deployment (a phenomenon we call demographic shift), prior work's fairness assurances are often invalid. In this paper, we consider the impact of demographic shift and present a class of algorithms, called Shifty algorithms, that provide high-confidence behavioral guarantees that hold under demographic shift when data from the deployment environment is unavailable during training. Shifty, the first technique of its kind, demonstrates an effective strategy for designing algorithms to overcome demographic shift's challenges. We evaluate Shifty using the UCI Adult Census dataset, as well as a real-world dataset of university entrance exams and subsequent student success. We show that the learned models avoid bias under demographic shift, unlike existing methods. Our experiments demonstrate that our algorithm's high-confidence fairness guarantees are valid in practice and that our algorithm is an effective tool for training models that are fair when demographic shift occurs.
more » « less
Full Text Available
Importance sampling in reinforcement learning with an estimated behavior policy

Hanna, Josiah; Niekum, Scott; Stone, Peter (June 2021, Machine learning)
null (Ed.)
Full Text Available
Fairness Guarantees Under Demographic Shift

Giguere, Stephen; Metevier, Blossom; Castro da Silva, Bruno; Brun, Yuriy; Thomas, Philip; Niekum, Scott (January 2022, International Conference on Learning Representations)

Full Text Available

« Prev Next »

Search for: All records